Welcome![Sign In][Sign Up]
Location:
Search - html web

Search list

[Internet-Network用Java编写HTML文件分析程序

Description:

Java编写HTML文件分析程序

 一、概述

    

    Web服务器的核心是对Html文件中的各标记(Tag)作出正确的分析,一种编程语言的解释程序也是对源文件中的保留字进行分析再做解释的。实际应用中,我们也经常会碰到需要对某一特定类型文件进行要害字分析的情况,比如,需要将某个HTML文件下载并同时下载与之相关的.gif.class等文件,此时就要求对HTML文件中的标记进行分离,找出所需的文件名及目录。在Java出现以前,类似工作需要对文件中的每个字符进行分析,从中找出所需部分,不仅编程量大,且易出错。笔者在近期的项目中利用Java的输入流类StreamTokenizer进行HTML文件的分析,效果较好。在此,我们要实现从已知的Web页面下载HTML文件,对其进行分析后,下载该页面中包含的HTML文件(假如在Frame中)、图像文件和ClassJava Applet)文件。

    

    二、StreamTokenizer

    

    StreamTokenizer即令牌化输入流的作用是将一个输入流中变成令牌流。令牌流中的令牌实体有三类:单词(即多字符令牌)、单字符令牌和空白(包括JavaC/C++中的说明语句)。

    

    StreamTokenizer类的构造器为: StreamTokenizer(InputStream in)

    

    该类有一些公有实例变量:ttypesvalnval ,分别表示令牌类型、当前字符串值和当前数字值。当我们需要取得令牌(即HTML中的标记)之间的字符时,应访问变量sval。而读向下一个令牌的方法是调用nextToken()。方法nextToken()的返回值是int型,共有四种可能的返回:

    

    StreamTokenizer.TT_NUMBER: 表示读到的令牌是数字,数字的值是double型,可以从实例变量nval中读取。

    

    StreamTokenizer.TT_Word: 表示读到的令牌是非数字的单词(其他字符也在其中),单词可以从实例变量sval中读取。

    

    StreamTokenizer.TT_EOL: 表示读到的令牌是行结束符。

    

    假如已读到流的尽头,则nextToken()返回TT_EOF

    

    开始调用nextToken()之前,要设置输入流的语法表,以便使分析器辨识不同的字符。WhitespaceChars(int low, int hi)方法定义没有意义的字符的范围。WordChars(int low, int hi)方法定义构造单词的字符范围。

    

    三、程序实现

    

    1HtmlTokenizer类的实现

    

    对某个令牌流进行分析之前,首先应对该令牌流的语法表进行设置,在本例中,即是让程序分出哪个单词是HTML的标记。下面给出针对我们需要的HTML标记的令牌流类定义,它是StreamTokenizer的子类:

    

    

    import java.io.*;

    import java.lang.String;

    class HtmlTokenizer extends

    StreamTokenizer {

    //定义各标记,这里的标记仅是本例中必须的,

    可根据需要自行扩充

     static int HTML_TEXT=-1;

     static int HTML_UNKNOWN=-2;

     static int HTML_EOF=-3;

     static int HTML_IMAGE=-4;

     static int HTML_FRAME=-5;

     static int HTML_BACKGROUND=-6;

     static int HTML_APPLET=-7;

    

    boolean outsideTag=true; //判定是否在标记之中

    

     //构造器,定义该令牌流的语法表。

     public HtmlTokenizer(BufferedReader r) {

    super(r);

    this.resetSyntax(); //重置语法表

    this.wordChars(0,255); //令牌范围为全部字符

    this.ordinaryChar('< '); //HTML标记两边的分割符

    this.ordinaryChar('>');

     } //end of constrUCtor

    

     public int nextHtml(){

    int token; //令牌

    try{

    switch(token=this.nextToken()){

    case StreamTokenizer.TT_EOF:

    //假如已读到流的尽头,则返回TT_EOF

    return HTML_EOF;

    case '< ': //进入标记字段

    outsideTag=false;

    return nextHtml();

    case '>': //出标记字段

    outsideTag=true;

    return nextHtml();

    case StreamTokenizer.TT_WORD:

    //若当前令牌为单词,判定是哪个标记

    if (allWhite(sval))

     return nextHtml(); //过滤其中空格

    else if(sval.toUpperCase().indexOf("FRAME")

    !=-1 && !outsideTag) //标记FRAME

     return HTML_FRAME;

    else if(sval.toUpperCase().indexOf("IMG")

    !=-1 && !outsideTag) //标记IMG

     return HTML_IMAGE;

    else if(sval.toUpperCase().indexOf("BACKGROUND")

    !=-1 && !outsideTag) //标记BACKGROUND

     return HTML_BACKGROUND;

    else if(sval.toUpperCase().indexOf("APPLET")

    !=-1 && !outsideTag) //标记APPLET

     return HTML_APPLET;

    default:

    System.out.println ("Unknown tag: "+token);

    return HTML_UNKNOWN;

     } //end of case

    }catch(IOException e){

    System.out.println("Error:"+e.getMessage());}

    return HTML_UNKNOWN;

     } //end of nextHtml

    

    protected boolean allWhite(String s){//过滤所有空格

    //实现略

     }// end of allWhite

    

    } //end of class

    

    以上方法在近期项目中测试通过,操作系统为Windows NT4,编程工具使用Inprise Jbuilder3


Platform: | Size: 1066 | Author: tiberxu | Hits:

[RichEdit网易Html编辑器

Description: 非常不错的编辑工具,用于一些简单的论坛和留言本非常的合适..跟网易的网页内容编辑一样 -very good editing tools for simple forums and message of this very appropriate .. with the NetEase Web content editors
Platform: | Size: 23355 | Author: 辉辉 | Hits:

[CSharpc sharp在创建web程序时一些疑难问题解决

Description: 切身体会 初学C sharp时通常遇到的问题 特别是以前没做过web程序的人 HTML格式-personal experience that when C sharp normally encountered problems, especially before procedures have been done web of HTML format
Platform: | Size: 88785 | Author: 小静 | Hits:

[Other resourceHTML starter

Description: html初级教程,对html语言进行基本介绍,适合网页制作初学者-html primary guide to basic html language, suitable for beginners web
Platform: | Size: 78827 | Author: ding | Hits:

[Web Server用Java实现Web服务器

Description: 用Java实现Web服务器 本文实现了GET请求的Web服务器程序的方法,通过创建ServerSocket类对象,监听端口8080; 等待、接受客户机连接到端口8080; 创建与socket字相关联的输入流和输出流 然后,读取客户机的请求信息,若请求类型是GET,则从请求信息中获取所访问的HTML文件名,如果HTML文件存在,则打开HTML文件,把HTTP头信息和HTML文件内容通过socket传回给Web浏览器,然后关闭文件。否则发送错误信息给Web浏览器。最后,关闭与相应Web浏览器连接的socket字。-Java Web server is to achieve a GET request to the Web server, through the creation of ServerSocket class object, bugging port 8080; Wait, a client is connected to port 8080; Socket character creation and the associated input and output streams flow then read the client's request information, if the request is the type of GET, request information from being accessed visit HTML document, and if the HTML document exists, then open the HTML file, HTTP headers and HTML files through the socket sent back to the Web browser and then close the file. Otherwise, send the wrong message to the Web browser. Finally, the closing and the corresponding Web browser connected to the socket word.
Platform: | Size: 10425 | Author: 雨岳 | Hits:

[Windows Develop添加html视图

Description: 该程序使用html视图添加web browser控件的功能,并通过使用ongoback函数和ongoforward函数实现向前和向后浏览以前浏览过的站点的功能。-the procedures used to add html View web browser control functions, and through the use of function and ongoback ongoforward Function forward and backward here before visited the site functions.
Platform: | Size: 35033 | Author: 邢馨华 | Hits:

[Other Web Code最完整的html在线编辑器

Description: 目前功能最完整的html在线编辑器,包含html编辑器,代码编写器,滚动条编辑器,网页配色器,颜色选择器 ,论坛转帖工具,js脚本加密器以及其他工具,大小为58k,全html程序,非常适合网页设计者使用-feature the most complete online html editor, contains html editor, code-prepared, scroll editor, web-matching, color selector, the Forum patented tool js script encryption devices and other tools, The size of 58k, all html procedures and are very suitable for web designers use
Platform: | Size: 62300 | Author: eric | Hits:

[ADO-ODBChbtest4

Description: 1、 基于J2EE的多层结构 以J2EE中间件为核心,综合关系数据库技术、面向对象技术、Web应用技术等方面。 2、 基本语言 Java作为主要编程语言; HTML作为Web页面描述的标准语言; XML作为配置; JSP2.0作为Web构件的主要编程语言;难点所在: 3、 多服务器平台建设与工具使用 MySQL5.0作为关系数据库,附带管理工具; Apache Tomcat5.5作为Web服务器,附带管理工具和JDBC驱动器; Apache Ant1.6.5作为项目管理工具 Hibernate3.1.2作为对象到数据库的映射API -one, based on J2EE to the multi-storey structure at the core J2EE middleware, integrated relational database technology, object-oriented technology, Web application technology. 2, the basic Java language as the main programming language; HTML Web pages as the standard description language; XML configuration; JSP2.0 as the main component of Web programming languages; Difficulties lie : 3, Multiple server platform construction and the use of tools of MySQL 5.0 as a relational database, incidental management tool; Tomcat5.5 as Apache Web server, management tools and fringe JDBC driver; Apache Ant1.6.5 as project management tools targeted Hibernate3.1.2 the mapping of database API
Platform: | Size: 3798810 | Author: 易水寒萧 | Hits:

[Dialog_Window45335HTML2TXT

Description: HTML2TXT转换工具,用于HTML的网页解析。-HTML2TXT conversion tools for HTML web analytic.
Platform: | Size: 72305 | Author: 八云 | Hits:

[TCP/IP stackTCP Web Server

Description: 模拟TCP协议建立一个web服务器,通过把数据装入TCP套接字,可以向客户端发送.html, .jpg, .jif等格式的文件。-simulated TCP establish a web server, data loading TCP socket, can be sent to the client. Html. Jpg,. Jif format documents.
Platform: | Size: 8444 | Author: 木头 | Hits:

[Web Serverbook-D-Html

Description: 动态网页的教程和例子相当的不错!拿出来大家共享!-dynamic web tutorials and examples of good! Show to share!
Platform: | Size: 157701 | Author: sunjl | Hits:

[WEB Codehtml-code

Description: Calculator-html Fixed Header Code Responsive HTML Template Code calculator-sample sample-web
Platform: | Size: 610304 | Author: wosabi | Hits:

[WEB Codesample-web

Description: sample-web : html java
Platform: | Size: 91136 | Author: wosabi | Hits:

[androidXE7-WEB-SERVER测试通过

Description: 安卓系统下的WEB服务,端口7777,可以自由修改,HTML代码可以根据需要更换,delphi X10编译照样使用(Android system WEB services, port 7777, can be modified freely, HTML code can be replaced as needed.)
Platform: | Size: 119808 | Author: 老周11 | Hits:

[Other商城html_css设计(web+h5)

Description: 这是一个网站的html 页面可以帮助他人设计商城页面(This is a website's HTML page that helps others to design mall pages.)
Platform: | Size: 20030464 | Author: m123esrtfwe | Hits:

[WEB Code1728

Description: 仿照58同城招聘网站,基于web前端 html+css+javas,人机交互(Based on the web front-end HTML + CSS + javascript, human-computer interaction is simulated by 58 city recruitment website.)
Platform: | Size: 181248 | Author: 凌云彻 | Hits:

[WEB CodeWeb前端开发精品课 HTML与CSS进阶教程__莫振杰=

Description: HTML与CSS进阶教程内容结合笔者在前后端大量开发中的实战经验,系统化知识,浓缩精华,用通俗易懂的语言直击学习者的痛点。通过本书,能让你从“野生网页设计师”水平提升达到“真正前端工程师”水平。 全书分为两大部分:首部分是HTML进阶内容,主要介绍HTML高级技巧和HTML语义化;第二部分是CSS进阶内容,主要介绍CSS开发技巧、代码规范、性能优化、属性本质、重要概念(如包含块、BFC和IFC等)。 除了知识讲解,教程还融入了大量的开发案例,更加注重编程思维的培养,并且提供学习者一个流畅的学习思路。(This book is written on the basis of Big Nerd Ranch training textbook, which is a well-known training institute. It covers the key technical points urgently needed by modern front-end developers such as HTML5, CSS3, including responsive UI, access to remote Web services, building applications with Ember.js, and so on. In addition, it also introduces how to use the frontier development tools to debug and test code, and make full use of the powerful functions of Node.js and various open source NPM modules to develop. The book is divided into four parts, each part completes a project independently, from shallow to deep, step by step. In the process of building a series of applications, the core concepts and APIs of Web development are introduced. Whether you have experience in Web development or have background in other platforms, this book will benefit you greatly as long as you are interested in today's popular tools and development practices.)
Platform: | Size: 24869888 | Author: 图图i | Hits:

[Internet-NetworkPython_Project

Description: 本任务利用正则表达式解析给定的《The Merchant of Venice》 HTML网页文件,并将文件内容按Markdown格式 存储至文件中(This task uses regular expressions to analyze the given "the merchant of Venice" HTML web page file and store the contents of the file in markdown format)
Platform: | Size: 47104 | Author: 哈哈飞 | Hits:

[JSP/Javahtml+css+js 网页上动态显示时间XX:XX:XX

Description: js实现网页显示时间,如XX:XX:XX(JS implementation of web page display time, like: XX:XX:XX)
Platform: | Size: 79872 | Author: kiiou | Hits:

[WEB Code蓝色网络科技公司HTML网站模板

Description: 网站源代码,前端开发实例。蓝色网络科技公司HTML网站模板(HTML website template of blue network technology company)
Platform: | Size: 3496960 | Author: llmrlch | Hits:
« 1 2 3 4 5 67 8 9 10 11 ... 50 »

CodeBus www.codebus.net